webui: remove client-side context pre-check and rely on backend for limits #16506

ServeurpersoCom · 2025-10-10T22:39:57Z

webui: remove client-side context pre-check and rely on backend for limits

Removed the client-side context window pre-check and now simply sends messages
while keeping the dialog imports limited to core components, eliminating the
maximum context alert path

Simplified streaming and non-streaming chat error handling to surface a generic
'No response received from server' error whenever the backend returns no content

Removed the obsolete maxContextError plumbing from the chat store so state
management now focuses on the core message flow without special context-limit cases

fix: make SSE client resilient to premature [DONE] events in multi-turn agentic proxy chains (like legacy WebUI), ensuring all SSE chunks are displayed until the TCP stream fully closes.

close #16437

Master branch :
https://github.com/user-attachments/assets/edc1337d-2e19-4f99-a7ba-78f40146022f

This PR (don't care about Model Selector) :
https://github.com/user-attachments/assets/e9952e04-e189-434f-8536-84184193d704

allozaur

@ServeurpersoCom

Just a few cosmetic changes 😄 also, could u add screenshots/video to the PR description with comparison of before/after changes? Will be great for adding context for the future lookback.

tools/server/webui/src/lib/stores/chat.svelte.ts

tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte

ServeurpersoCom · 2025-10-11T14:30:03Z

@ServeurpersoCom

Just a few cosmetic changes 😄 also, could u add screenshots/video to the PR description with comparison of before/after changes? Will be great for adding context for the future lookback.

I’d love to make a temporary mini version of the model selector : just a simple field in Settings to declare the model in the JSON request. That way my llama-swap would work on master, and I could make videos of the master branch more easily!

ServeurpersoCom · 2025-10-11T16:13:01Z

I’ve added two videos, running on my Raspberry Pi 5 (16 GB) with Qwen3 30B A3B, fully synced with the master branch. You can see the bug where I got stuck : once the context overflows, the interface is completely blocked until you hit F5.

With the current PR build, it’s much better: if a message block is too large, it can still slip into the context and needs to be deleted manually. But since the backend decides, it never fully blocks. We could still improve it a bit by preventing oversized messages from being sent into the context in the first place.

ServeurpersoCom · 2025-10-11T16:34:16Z

Toolcall testing (Node.js proxy)

Google.what.the.weather-AVC-750kbps.mp4

ggerganov · 2025-10-12T05:13:33Z

@ServeurpersoCom Curious are you doing some OCR in the last video to detect text elements in the screenshots? Would love to learn more, but maybe after the PR is reviewed to avoid getting offtopic.

allozaur

@ServeurpersoCom let's just rebuild fresh webui static output and we good to go :)

…imits Removed the client-side context window pre-check and now simply sends messages while keeping the dialog imports limited to core components, eliminating the maximum context alert path Simplified streaming and non-streaming chat error handling to surface a generic 'No response received from server' error whenever the backend returns no content Removed the obsolete maxContextError plumbing from the chat store so state management now focuses on the core message flow without special context-limit cases

Co-authored-by: Aleksander Grygier <[email protected]>

…Screen.svelte Co-authored-by: Aleksander Grygier <[email protected]>

allozaur · 2025-10-12T16:06:30Z

@ServeurpersoCom actually I will improve the UI/UX of the new Alert Dialog in a separate PR so that we don't block this change :)

ServeurpersoCom · 2025-10-12T16:19:18Z

@ServeurpersoCom Curious are you doing some OCR in the last video to detect text elements in the screenshots? Would love to learn more, but maybe after the PR is reviewed to avoid getting offtopic.

Not OCR : the proxy just parses streamed text and DOM elements in real time.
I can still use OCR separately when needed (reading screenshots, captchas, etc.).

The model actually sees the entire page: it can analyze the full DOM and reach elements outside the viewport through an abstraction layer that simulates human actions (scroll, click, type).
That makes it effectively undetectable by anti-bot systems, while keeping inference fully streamed through the SSE proxy.

ServeurpersoCom · 2025-10-12T16:24:52Z

@ServeurpersoCom actually I will improve the UI/UX of the new Alert Dialog in a separate PR so that we don't block this change :)

Awesome can’t wait to see your pure Svelte touch on that dialog 😄

ggerganov · 2025-10-12T16:39:49Z

Nice. So this seems like some sort of ingenious way to control a headless? browser with an LLM. And the images in the WebUI are just "progress report" from the browser. It's a bit over my head, but definitely looks interesting.

ServeurpersoCom · 2025-10-12T16:53:23Z

Exactly, but not headless, full real browser with GPU capability (inside a software box) ! the goal is to convert the DOM (with all bounding boxes) into labeled text tokens for the LLM.
When the model wants to interact with a label, it also gets the list of possible actions on it.
The ToolCall + abstraction layer then takes care of reaching the right area (scrolling if needed) and performing what the LLM asked : typing or clicking.
It’s all heuristic logic, way simpler than the CUDA kernels in llama.cpp 😄

Idea: we could add a small module in llama.cpp that exposes every ToolCall event through a user-defined HTTP hook : that would let anyone easily connect their model to external actions or systems!

* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...

ServeurpersoCom requested a review from allozaur as a code owner October 10, 2025 22:39

github-actions bot added examples server labels Oct 10, 2025

allozaur requested changes Oct 11, 2025

View reviewed changes

allozaur mentioned this pull request Oct 11, 2025

Misc. bug: SveltKit WebUI blocks prompts that are >~1/3 the max context size #16437

Closed

allozaur requested changes Oct 12, 2025

View reviewed changes

ServeurpersoCom and others added 7 commits October 12, 2025 12:20

fix: make SSE client robust to premature [DONE] in agentic proxy chains

755e0c0

webui: cosmetic rename of error messages

fb6edd3

Update tools/server/webui/src/lib/stores/chat.svelte.ts

b544507

Co-authored-by: Aleksander Grygier <[email protected]>

Update tools/server/webui/src/lib/stores/chat.svelte.ts

db8fd13

Co-authored-by: Aleksander Grygier <[email protected]>

Update tools/server/webui/src/lib/components/app/chat/ChatScreen/Chat…

a4ce401

…Screen.svelte Co-authored-by: Aleksander Grygier <[email protected]>

Update tools/server/webui/src/lib/components/app/chat/ChatScreen/Chat…

be85c24

…Screen.svelte Co-authored-by: Aleksander Grygier <[email protected]>

ServeurpersoCom force-pushed the ctxsize-rely-on-backend branch from 28badc5 to be85c24 Compare October 12, 2025 10:21

chore: update webui build output

6b25fc0

allozaur approved these changes Oct 12, 2025

View reviewed changes

allozaur merged commit 81d54bb into ggml-org:master Oct 12, 2025
14 checks passed

webui: remove client-side context pre-check and rely on backend for limits #16506

webui: remove client-side context pre-check and rely on backend for limits #16506

Uh oh!

Conversation

ServeurpersoCom commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 11, 2025

Uh oh!

ServeurpersoCom commented Oct 11, 2025

Uh oh!

ServeurpersoCom commented Oct 11, 2025

Uh oh!

ggerganov commented Oct 12, 2025

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

allozaur commented Oct 12, 2025

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 12, 2025

Uh oh!

ServeurpersoCom commented Oct 12, 2025

Uh oh!

ggerganov commented Oct 12, 2025

Uh oh!

ServeurpersoCom commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ServeurpersoCom commented Oct 10, 2025 •

edited

Loading

ServeurpersoCom commented Oct 12, 2025 •

edited

Loading